DNA methylation clocks for dogs and humans

Significance Epigenetic estimators of age (known as clocks) allow one to identify interventions that slow or reverse aging. Previous epigenetic clocks only applied to one species at a time. Here, we describe epigenetic clocks that apply to both dogs and humans. These clocks, which measure methylation levels in highly conserved stretches of the DNA, promise to increase the likelihood that interventions that reverse epigenetic age in one species will have the same effect in the other.


Supplementary Figures
Supplementary Figure S1. Unsupervised hierarchical clustering reveals that canine methylation profiles cluster primarily by sex. Canine blood samples (n = 742 representing 93 breeds) cluster into a single group at a height cut-off of 0.04 (indicated by the 'branch' covariate), and separate into two groups primarily driven by sex rather than breed. Average linkage hierarchical clustering is based on the inter-array correlation. The dissimilarity measure is defined as one minus the Pearson correlation coefficient. For age covariate, white indicates low values and red indicates high values. PCs 1 through 7 were removed from the original data by projecting samples against the orthogonal PC space. Breeds with three or fewer samples were excluded from this analysis. a) Two-dimensional representation of the PC-reduced methylation dataset using UMAP (1). Samples belonging to the same breed are joined together by a semitransparent polygon. b, c) Average number of correct and incorrect breed classifications using elastic net multinomial regression (α=0.5, λ=0.001) (2) across 10 sub-samplings of the PC-reduced panel. Error lines correspond to the average number of correct/incorrect classifications ± standard deviation. The classifier was trained on either b) approximately half the samples from every breed while the remaining half was used for prediction, or c) all but one sample from every breed with the remaining sample used for prediction.

Supplementary Figure S3. Association of age epigenetic clocks with breed characteristics.
Association analysis for age acceleration measured by three clocks: a-d) pure dog epigenetic clock of age; e-h) human-dog clock for chronological age; and i-l) human-dog clock for relative age. All age acceleration measures are with respect to DNAm age estimates, in units of years. For each panel, we report the Pearson correlation estimate (cor) and corresponding Student's ttest p-value. Breeds index and colors are specified in the legend of SI Appendix, Fig. S2.
Supplementary Figure S4. Age effects in methylation across tissues and species. Correlation tests relating CpG methylation to chronological age across different dog and human tissues were compared using a Z statistic (Student's t-test statistic). a) Meta-analysis aggregating the age effect in all human tissues (Stouffer's method) versus dog blood. b-k) Individual human tissue age effects versus dog blood, including b) bone marrow, c) spleen, d) lung, e) kidney, f) heart, g) skeletal muscle, h) adipose, i) liver, j) skin, and k) blood. Each dot corresponds to a CpG on the mammalian array. Positive Z statistic values indicate a positive correlation between age and the CpG, i.e., age-related gain of methylation. Z statistic values larger than 2 (or smaller than -2) correspond approximately to a two-sided correlation test P < 0.05. Figure S5. Individual CpGs that relate to breed lifespan and weight. Scatterplots depicting the top 10 CpGs that relate to both lifespan (left) and weight (right) (see Dataset S4). Each panel reports the CpG and adjacent gene as well the Pearson correlation coefficient (R) and the p-value (in blue). The epigenetic clocks were used by employing a single elastic net regression model analysis (R function glmnet). We use used Leave-one-out analysis (LOO) using a single lambda value. We chose the following parameters for the glmnet R function (Alpha: 0.5, CV Fold: 10, Lambda choice for Clock: 1 standard error above minimum CV-MSE).

Covariates and coefficient values of the dog clocks
The coefficient values of the clocks are specified in Dataset S8.
1) The dog clock for blood is based on 52 CpGs whose coefficient values are specified in the column "Coef.DogBlood". Age transformation=identity, i.e. F(Age)=Age 2) The human dog clock for chronological age is based on 561 CpGs whose coefficient values are specified in the column "Coef.HumanDogLogLinearAge". Age transformation=log-linear described below.
3) The human dog clock for relative age is based on 497 CpGs whose coefficient values are specified in the column "Coef.HumanDogRelativeAge".

General description of age transformation
The human-dog clocks for chronological age used log linear transformations that are similar to those employed for the HUMAN pan tissue (Horvath 2013) (27).
An elastic net regression model (implemented in the glmnet R function) was used to regress a transformed version of age on the beta values in the training data. The glmnet function requires the user to specify two parameters (alpha and beta). Since I used an elastic net predictor, alpha was set to 0.5. But the lambda value of was chosen by applying a 10 fold cross validation to the training data (via the R function cv.glmnet).
The elastic net regression results in a linear regression model whose coefficients b0, b1, . . . , relate to transformed age as follows Note that the intercept term is denoted by b0. The coefficient values can be found in Dataset S8. Based on the coefficient values from the regression model, DNAmAge is estimated as follows , where !" ( ) denotes the mathematical inverse of the function F(.). Thus, the regression model can be used to predict to transformed age value by simply plugging the beta values of the selected CpGs into the formula.

Defining Properties of the log linear transformation
As indicated by its name, the "log-linear" function, has a logarithmic dependence on age before the average age of sexual maturity (of the species) and a linear dependence after Age at Sexual Maturity (of the species). For the human-dog clocks we used the following averages at sexual maturity (in units of years): 13.5 years for humans and 1.83287671232877 years for dogs.
We used a piecewise transformation, parameterized by Age of Sexual Maturity ( ).
The transformation is F(x), given by In order to use this transformation to predict Age on new samples, one needs to use the inverse transformation, F -1 (y), given by !" ( ) = 1 ( + 1.5) * exp( ) − 1.5, ≤ 0 ( + 1.5) + , ≥ 0 For predicting age, apply the inverse transformation to coefficient-weighted sum. That is, where is the vector of coefficients and is the vector of methylation values, with an intercept term.

R Implementation of the log linear transformation
The DNAm Age estimate is estimated in two steps.
First, one forms a weighted linear combination of the CpGs whose details can be found in Dataset S8.
The table reports the probe identifier (cg number) used in the custom Infinium array (HorvathMammalMethylChip40). The weights used in this linear combination are specified in the respective column entitled "Coef.".
The formula assumes that the DNA methylation data measure "beta" values but the formula could be adapted to other ways of generating DNA methylation data. #let's relabel the columns by replacing "Coef" with "DNAm" since the columns contain estimates of age or relative age instead of coefficient values colnames(datPredictions)=gsub(pattern="Coef", replacement="DNAm", x=colnames(datPredictions)) # We need to transform the human dog clock for chronological age using the inverse of the log linear #transformation. For dogs, the age at sexual maturity has to be set to 1.83287671232877 years.

Supplementary Note 3: Details regarding the overlap analysis between dog EWAS and human GWAS.
We related our EWAS results in dogs with a broad category of human GWAS studies: anthropometric traits, behavioral phenotypes, cognitive related traits, fetal growth traits, inflammatory diseases, lipid panel outcomes, metabolic outcomes and diseases, neurodegenerative and neuropsychiatric disorders, longevity, reproductive aging and other age related phenotypes including DNA methylation based biomarkers. All GWAS results are based on meta-analysis across large-scale human studies. For instance, GWAS of anthropometric traits involved more than 200k individuals from multiple ethnic groups, conducted by the GIANT consortium, https://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium. We used the MAGENTA software (28) to define gene level p-values for the human genome wide association studies (GWAS) results from a total of 70 large-scale GWAS studies (29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47). For each GWAS, we focused on the top 2.5% genes for downstream enrichment analysis. Other thresholds would lead to qualitatively similar results. The EWAS-GWAS enrichment analysis was based on genomic-region hypergeometric analysis as described in Materials and Methods. The GWAS summary datasets were downloaded from OpenGWAS (48-50) (https://gwas.mrcieu.ac.uk/) or obtained from the corresponding study groups. Citations to the respective scientific papers are provided below in Table SN1.

Supplementary Note 4: CpG methylation influence on nearby gene transcription.
We leveraged published canine RNA-seq and ChIP-seq data (See Table SN2  The RNA-seq data was mapped against the CanFam3.1 reference gene annotation and processed according to the GTEx TOPMed pipeline, as detailed in (51). As a QC metric, we observed clustering by RNA-seq tissue regardless of study or coverage and found concordance between the transcription levels derived in our analysis, those obtained from the original publications and the human GTEx atlas (human-dog transcription level correlations for 26,204 genes: ρblood=0.78, ρskin=0.83, ρtestes=0.77, ρbrain=0.82). The ChIP-seq data was not modified, as it was already processed (52) and mapped to the CanFam3.1 genome build.
In concordance with their enrichment in polycomb repressive complex 2 (PRC2) binding signals, the genes assigned to age-associated CpGs were found to be more often untranscribed in all adult tissues compared to the methylation array transcriptional background (Fisher exact test P = 2.2x10-16 and (See Table SN3 below). Genes assigned to lifespan and weight CpGs had, on average, higher expression levels than the array background across all tissues (Wilcoxon rank sum test P = 9.5x10-15). Finally, we sought to determine the degree to which the average methylation score across all samples can predict average blood transcription levels for age-, weight-and lifespan-associated CpGs. We fitted a binomial linear regression model encoding presence or absence of transcription including, as response variables, average methylation, average H3K4me3 and H3K4me1 peak intensities for overlapping CpGs, and repressive chromatin state marks (H3K27me3) annotated from the human reference (See Table SN4 below). We found that methylation is a highly significant factor in all cases, but the predictive power of the model, measured as the odds ratio of the model's confusion matrix, was not particularly high. We did not find a significant correlation between transcription intensity and CpG methylation percentage in transcribed genes. We observe that, while assignation of CpGs to nearby genes based on human synteny blocks (GREAT) provides a sensible framework for the enrichment and association analyses in our manuscript, not all the reported CpGs are expected to influence the transcription of their respective annotated genes. Additionally, we note that human universal chromatin state maps, as used in our manuscript, can explain up to 26.8 percent of transcription variation (P < 10 -200 ) in dog blood and therefore represent an acceptable proxy for general transcriptional activity in the absence of RNAseq data.